Speech-to-text transcription services can convert a voice mail to text and deliver it to the intended recipient through e-mail or text message. Conventionally, voice mail recording and voice transcription are separate processes that occur in series. For example, to record a voice message, voice data may be received in a certain format, converted to a further compressed format such as Global System for Mobile Communications (GSM), and stored in a WAV file, which refers to the Waveform Audio File Format developed by Microsoft and IBM. Upon completion of the recording, compressed voice data may be transmitted for transcription. Upon receipt of the compressed voice data for transcription, the compressed voice data may be converted to a different format such as G.711 of the Telecommunication Standardization Sector (ITU-T), which is a sector of the International Telecommunication Union (ITU). The voice data may then be transcribed to readable text. The processes of recording the voice message and converting the voice message to text causes a delay between completion of the voice mail and transmission of the converted text to the intended recipient. Additionally, because the voice message is compressed for storage and then later converted to another format (e.g., an uncompressed format) for transcription, audio quality may be degraded, which reduces accuracy of the transcription. Thus, there is an ever present need to reduce delay in transmitting text transcribed from voice data to its intended recipient and increase transcription accuracy.